Exercise 1: Preparation

In the load_data chunk we need to load all the data we will need for the analysis. We have 2 data sources:

The data is stored respectively in 3 data.frames: df.design, df.metabolomics.raw and df.metabolomics.design.

Exercise 1.1

First thing to do when encountering a new dataset is to look at it’s shape and properties.

## Observations: 28,950
## Variables: 11
## $ bucket          <chr> "344.07941 Da 18.36 s", "344.07941 Da 18.36 s", "344.…
## $ RT              <dbl> 0.31, 0.31, 0.31, 0.31, 0.31, 0.31, 0.63, 0.63, 0.63,…
## $ mz_ratio        <dbl> 345.0867, 345.0867, 345.0867, 345.0867, 345.0867, 345…
## $ Name            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Formula         <chr> "C20H12N2O4", "C20H12N2O4", "C20H12N2O4", "C20H12N2O4…
## $ phase           <chr> "apolar", "apolar", "apolar", "apolar", "apolar", "ap…
## $ mass            <dbl> 344.0794, 344.0794, 344.0794, 344.0794, 344.0794, 344…
## $ charge          <dbl> 0.9970811, 0.9970811, 0.9970811, 0.9970811, 0.9970811…
## $ metabolomics_id <chr> "202002_TTP_SILS2665_RA_1_1 _merged_", "202002_TTP_SI…
## $ value           <dbl> 10066, 9210, 9564, 8882, 9346, 8106, 3996, 4116, 4132…
## $ value_log10     <dbl> 4.002900, 3.964307, 3.980685, 3.948560, 3.970672, 3.9…

Question 1.1

What is in data.frame and what do you think is the meaning of each column? Is the data in a tidy format?

Feature Statistics

As you might have notice, some of the features have a formula and/or a label associated to it but other don’t. In addition, there are features that have the same formula and/or label. Create a figure showing the percentage of features with a formula and with a name for each phase and a table showing the total number of features and with names and formulas and how many of those are unique for each phase.

phase total_count n_formula n_uniq_formula n_label n_uniq_label
apolar 11742 8460 1237 108 16
polar 17208 8718 1240 660 89

Distribution of Raw Values

Number of Zeros

Total Intensity

MA Plots

Normalizing

df.metabolomics <- df.metabolomics.raw %>%
  group_by(phase, bucket) %>%
  mutate(
    value_norm = ifelse(value <= 0, min(value[value > 0], na.rm = T) / 2, value),
    value_norm_log10 = log10(value_norm)
  ) %>%
  group_by(phase)

Total Intensity

MA Plots

PCA

Loadings of PC 1

Most down-regulated metabolites of PC 1:

phase bucket loading
apolar 870.56984 Da 803.03 s -0.1775119
apolar 892.54105 Da 803.03 s -0.1252981
apolar 551.42610 Da 842.78 s -0.1207451
apolar 396.00441 Da 41.68 s -0.1011673
apolar 886.56398 Da 850.98 s -0.0993336
apolar 792.56643 Da 505.34 s -0.0991903
apolar 349.96167 Da 41.67 s -0.0961210
apolar 1052.52432 Da 807.07 s -0.0849786
apolar 381.98812 Da 41.95 s -0.0791344
apolar 335.94565 Da 41.39 s -0.0776370
polar 202.16809 Da 35.67 s -0.0963273
polar 545.16130 Da 170.88 s -0.0959617
polar 143.09457 Da 35.95 s -0.0698610
polar 551.17860 Da 52.61 s -0.0688578
polar 1389.01032 Da 439.32 s -0.0683566
polar 279.13232 Da 47.75 s -0.0671033
polar 1275.92760 Da 432.91 s -0.0634677
polar 491.44838 Da 1179.75 s -0.0631968
polar 505.17065 Da 53.87 s -0.0624037
polar 198.08906 Da 403.32 s -0.0621769

Most up-regulated metabolites of PC 1:

phase bucket loading
apolar 195.05665 Da 40.87 s 0.2267028
apolar 412.09570 Da 40.31 s 0.1440448
apolar 175.96327 Da 18.43 s 0.1026345
apolar 467.00597 Da 39.82 s 0.1018463
apolar 131.97352 Da 18.49 s 0.0874081
apolar 243.95081 Da 24.01 s 0.0765888
apolar 379.92616 Da 20.53 s 0.0744116
apolar 311.93825 Da 20.61 s 0.0734376
apolar 243.95122 Da 70.97 s 0.0706117
apolar 379.92574 Da 33.46 s 0.0665436
polar 195.05672 Da 46.20 s 0.1339618
polar 780.22632 Da 47.97 s 0.1121678
polar 684.08708 Da 50.27 s 0.1023996
polar 330.11082 Da 47.20 s 0.0893236
polar 343.95574 Da 45.53 s 0.0657282
polar 443.02423 Da 50.64 s 0.0651033
polar 545.16084 Da 190.07 s 0.0634573
polar 218.04115 Da 45.53 s 0.0630638
polar 507.04174 Da 50.76 s 0.0581213
polar 553.82966 Da 26.74 s 0.0551570

Heatmap

Another useful tool is to plot the expression levels in a heatmap.

T-test

Volcano Plot

List of Significant Metabolites

Finally print a table with all interesting buckets.

Polar Phase

Metabolites with adj P-value < 0.01 and absolute log ratio > 2
phase bucket Name Formula logFC p.value padj
polar 1389.01032 Da 439.32 s NA C55H112N44 -4.555173 0.0000000 0.0000000
polar 1275.92760 Da 432.91 s NA C70H134NO14PS -4.215218 0.0000000 0.0000000
polar 505.17065 Da 53.87 s NA C13H36N3O13PS -4.305261 0.0000000 0.0000000
polar 113.08366 Da 47.02 s NA NA 3.256798 0.0000000 0.0000000
polar 248.97570 Da 51.41 s NA NA 3.654155 0.0000000 0.0000000
polar 162.05319 Da 53.18 s NA C6H10O5 -3.078141 0.0000000 0.0000000
polar 99.06825 Da 48.34 s NA C5H9NO 3.753294 0.0000000 0.0000000
polar 324.10542 Da 52.62 s NA C17H16N4OS -3.165424 0.0000000 0.0000000
polar 557.08069 Da 53.14 s NA NA -3.012152 0.0000000 0.0000000
polar 135.05421 Da 46.92 s NA NA -4.030530 0.0000000 0.0000000
polar 653.80118 Da 35.63 s NA C15H6N6O8S8 -2.936358 0.0000000 0.0000000
polar 162.05294 Da 48.25 s NA C6H10O5 -4.163746 0.0000000 0.0000000
polar 307.01122 Da 49.47 s NA C8H9N3O8S -2.829478 0.0000000 0.0000000
polar 118.02789 Da 46.85 s NA C5H2N4 -2.469851 0.0000000 0.0000000
polar 218.04115 Da 45.53 s NA C5H10N6S2 4.446394 0.0000000 0.0000000
polar 195.05672 Da 46.20 s 2-_N-morpholino_ethanesulfonic acid C6H13NO4S 9.585964 0.0000000 0.0000000
polar 281.91637 Da 34.82 s NA C8H2N4S4 -2.654038 0.0000000 0.0000000
polar 684.08708 Da 50.27 s NA C19H28N10O10S4 7.063308 0.0000000 0.0000000
polar 364.09957 Da 47.62 s NA C19H16N4O2S -2.058190 0.0000000 0.0000000
polar 851.81735 Da 35.69 s NA NA 2.005496 0.0000000 0.0000000
polar 443.02423 Da 50.64 s NA C21H10N5O3PS 4.554010 0.0000000 0.0000000
polar 54.04650 Da 51.02 s NA NA -2.164031 0.0000000 0.0000000
polar 80.96480 Da 48.87 s NA NA 2.927677 0.0000000 0.0000000
polar 165.04569 Da 42.94 s NA NA -2.108192 0.0000000 0.0000000
polar 551.17860 Da 52.61 s NA C24H42NO3PS4 -4.568843 0.0000000 0.0000000
polar 182.20346 Da 1169.50 s NA C13H26 -2.225089 0.0000000 0.0000000
polar 506.42937 Da 1163.27 s NA C27H58N2O6 -2.044209 0.0000000 0.0000000
polar 330.11082 Da 47.20 s NA C19H14N4O2 6.473521 0.0000000 0.0000000
polar 666.22332 Da 65.72 s NA C17H44N6O17P2 -2.186947 0.0000000 0.0000000
polar 153.98977 Da 215.00 s NA C6H2O5 -3.892951 0.0000000 0.0000000
polar 135.05435 Da 47.06 s Adenine C5H5N5 -2.533111 0.0000000 0.0000000
polar 468.37287 Da 1179.28 s NA C28H53O3P -3.342201 0.0000000 0.0000000
polar 505.42821 Da 1174.94 s NA C35H55NO -2.521896 0.0000000 0.0000000
polar 254.09980 Da 49.53 s NA C9H18O8 -3.468359 0.0000000 0.0000000
polar 491.44838 Da 1179.75 s NA C27H61N3O2S -2.660801 0.0000000 0.0000000
polar 202.16809 Da 35.67 s NA C10H22N2O2 -2.633177 0.0000000 0.0000000
polar 952.76234 Da 33.05 s NA C47H120N2O3S6 -2.202088 0.0000000 0.0000000
polar 64.00597 Da 46.78 s NA NA -2.522702 0.0000000 0.0000000
polar 279.13232 Da 47.75 s NA C12H25NO2S2 -4.567685 0.0000000 0.0000000
polar 143.09457 Da 35.95 s NA C7H13NO2 -2.300674 0.0000000 0.0000000
polar 302.12622 Da 407.64 s NA C8H22N4O6S -2.043212 0.0000000 0.0000000
polar 84.09360 Da 1174.56 s NA NA -2.219466 0.0000000 0.0000000
polar 446.39040 Da 1170.36 s NA C17H50N8O5 -2.146587 0.0000000 0.0000000
polar 302.06080 Da 66.17 s NA C8H18N2O6S2 2.066950 0.0000000 0.0000000
polar 211.05125 Da 67.62 s NA C6H13NO5S 2.214530 0.0000000 0.0000001
polar 233.03331 Da 68.81 s NA C12H11NS2 2.218716 0.0000000 0.0000001
polar 780.22632 Da 47.97 s NA C27H53N6O8PS5 8.985664 0.0000000 0.0000001
polar 324.10409 Da 67.10 s NA C17H24S3 -2.304131 0.0000000 0.0000001
polar 309.96791 Da 67.81 s NA C8H3N6O4PS 2.248768 0.0000000 0.0000001
polar 418.57380 Da 48.36 s NA NA 2.127005 0.0000000 0.0000002
polar 255.96926 Da 300.67 s NA NA 2.287815 0.0000000 0.0000002
polar 440.14031 Da 52.94 s NA C15H20N8O8 2.329616 0.0000000 0.0000003
polar 636.12403 Da 101.36 s NA C17H44N6O3S8 -2.389652 0.0000000 0.0000003
polar 186.01639 Da 214.63 s NA C7H6O6 -4.846004 0.0000000 0.0000004
polar 636.12253 Da 100.16 s NA C24H35N2O10P3S -2.006664 0.0000000 0.0000004
polar 312.13616 Da 50.15 s NA C19H20O4 2.413914 0.0000001 0.0000007
polar 975.28149 Da 48.06 s NA C33H49N23OS6 2.558661 0.0000001 0.0000010
polar 86.03658 Da 36.94 s NA NA -2.607038 0.0000001 0.0000010
polar 85.05253 Da 37.75 s NA C4H7NO -2.388347 0.0000001 0.0000010
polar 211.05153 Da 67.98 s NA C6H13NO5S 2.855304 0.0000003 0.0000023
polar 195.45403 Da 48.06 s NA NA 2.859702 0.0000004 0.0000028
polar 310.12002 Da 45.46 s NA C11H22N2O6S 2.695707 0.0000006 0.0000039
polar 195.05656 Da 124.38 s NA C6H13NO4S 2.878740 0.0000006 0.0000041
polar 424.11927 Da 45.00 s NA C13H32N2O5S4 3.067000 0.0000007 0.0000043
polar 342.11236 Da 47.50 s NA C19H18O6 4.972082 0.0000007 0.0000046
polar 313.56890 Da 46.09 s NA NA 2.775471 0.0000009 0.0000057
polar 753.53621 Da 865.22 s NA NA -3.140059 0.0000009 0.0000059
polar 72.02134 Da 43.52 s NA C3H4O2 2.305519 0.0000011 0.0000067
polar 390.67583 Da 47.24 s NA NA 3.266031 0.0000012 0.0000072
polar 732.22414 Da 45.42 s NA C19H41N16O7PS3 3.195876 0.0000015 0.0000092
polar 107.98852 Da 67.88 s NA NA 3.430137 0.0000016 0.0000097
polar 110.07262 Da 403.89 s NA NA -2.076471 0.0000018 0.0000106
polar 197.03605 Da 160.38 s NA NA 3.360882 0.0000018 0.0000107
polar 537.17417 Da 52.41 s NA C13H40N5O9PS3 5.490517 0.0000018 0.0000109
polar 684.08553 Da 49.48 s NA C25H9N20O4P 3.249192 0.0000019 0.0000113
polar 192.02681 Da 83.77 s Isocitric acid C6H8O7 -2.885696 0.0000024 0.0000139
polar 626.16820 Da 185.51 s NA C19H47O12PS4 3.243613 0.0000025 0.0000142
polar 569.14799 Da 188.92 s NA C27H27N3O9S 3.526546 0.0000041 0.0000222
polar 293.11182 Da 74.41 s NA C12H23NO3S2 -2.458849 0.0000050 0.0000262
polar 474.01089 Da 47.04 s NA NA -2.285457 0.0000053 0.0000279
polar 424.14514 Da 52.69 s NA C16H32N4OS4 3.730984 0.0000064 0.0000330
polar 424.14674 Da 52.13 s NA C13H33N2O7PS2 4.566647 0.0000070 0.0000356
polar 507.04174 Da 50.76 s NA C21H5N11O6 4.323988 0.0000071 0.0000358
polar 549.85850 Da 29.05 s NA NA 2.003548 0.0000080 0.0000401
polar 135.98121 Da 29.42 s NA NA 2.214074 0.0000086 0.0000428
polar 153.12665 Da 47.66 s NA C8H15N3 -2.154987 0.0000128 0.0000614
polar 521.19632 Da 52.29 s NA C32H23N7O -2.456207 0.0000130 0.0000618
polar 335.08973 Da 101.06 s NA C9H23NO8P2 -2.062132 0.0000185 0.0000855
polar 535.03645 Da 50.76 s NA C16H6N15O6P 3.506536 0.0000206 0.0000924
polar 225.03063 Da 215.72 s NA C6H11NO6S 2.029406 0.0000227 0.0000990
polar 187.19382 Da 422.38 s NA C11H25NO 3.308400 0.0000287 0.0001207
polar 330.11118 Da 45.92 s NA C19H22OS2 3.578154 0.0000300 0.0001246
polar 195.07332 Da 46.94 s NA NA 4.877764 0.0000347 0.0001397
polar 553.82966 Da 26.74 s NA NA 3.317413 0.0000361 0.0001438
polar 128.04615 Da 784.34 s NA NA -2.470722 0.0000529 0.0001990
polar 429.08512 Da 235.88 s NA C24H15NO7 -4.132066 0.0000562 0.0002090
polar 228.10056 Da 337.15 s NA C12H20S2 -2.283454 0.0000576 0.0002128
polar 144.00508 Da 164.12 s NA NA -2.159048 0.0000594 0.0002182
polar 689.03853 Da 100.75 s NA NA -2.455262 0.0000633 0.0002308
polar 271.09782 Da 40.14 s NA C12H18NO4P -2.257001 0.0000674 0.0002434
polar 356.12601 Da 54.83 s NA C20H20O6 5.769899 0.0000786 0.0002778
polar 412.04905 Da 591.83 s NA NA -2.141486 0.0000828 0.0002909
polar 343.95574 Da 45.53 s NA NA 4.119137 0.0000856 0.0002987
polar 315.10435 Da 40.18 s NA C16H17N3O2S 3.475439 0.0001049 0.0003585
polar 565.83665 Da 29.00 s NA C13H3N4O12PS4 2.313891 0.0001126 0.0003827
polar 406.27013 Da 988.50 s NA C13H34N12OS -2.892391 0.0001250 0.0004199
polar 353.01730 Da 49.08 s NA C10H15N3O5S3 -2.922707 0.0001300 0.0004334
polar 339.35181 Da 1143.69 s NA NA -2.066433 0.0001305 0.0004341
polar 310.25076 Da 1097.05 s NA C19H34O3 -2.065588 0.0001410 0.0004631
polar 197.11625 Da 49.80 s NA C9H15N3O2 -2.190925 0.0001495 0.0004890
polar 107.98851 Da 151.00 s NA NA 3.109918 0.0001550 0.0005045
polar 81.98175 Da 1174.56 s NA NA -3.942862 0.0001720 0.0005488
polar 407.90954 Da 42.95 s NA NA 2.276425 0.0002041 0.0006399
polar 594.18444 Da 187.01 s NA C43H22N4 -2.474780 0.0002277 0.0007060
polar 398.14052 Da 41.90 s NA C11H32N2O7P2S -2.156689 0.0002412 0.0007430
polar 338.11726 Da 129.06 s NA NA -2.279316 0.0003163 0.0009459
polar 74.00025 Da 586.69 s NA NA -2.020343 0.0003212 0.0009545
polar 215.11621 Da 403.33 s NA C10H17NO4 -2.886132 0.0003425 0.0010097
polar 576.13728 Da 560.21 s NA NA -2.372809 0.0003452 0.0010165
polar 56.06225 Da 1178.52 s NA NA -2.653122 0.0004098 0.0011752
polar 159.12589 Da 38.85 s NA NA -2.408645 0.0004291 0.0012246
polar 351.08872 Da 183.84 s NA C7H13N9O8 -3.699313 0.0005237 0.0014622
polar 270.14690 Da 581.44 s NA C14H22O5 -2.294280 0.0008794 0.0023354
polar 706.16532 Da 1050.14 s NA C30H39N6O4PS4 -2.127643 0.0014082 0.0035646
polar 143.16765 Da 179.80 s NA C9H21N 2.617752 0.0014405 0.0036369
polar 162.99232 Da 30.05 s NA NA 3.652277 0.0017044 0.0041816
polar 130.99596 Da 29.67 s NA NA 2.348092 0.0019884 0.0048043
polar 174.53774 Da 211.84 s NA NA -2.595379 0.0026957 0.0063143
polar 311.55770 Da 52.79 s NA NA 2.607354 0.0027307 0.0063777
polar 56.06226 Da 1166.73 s NA NA -2.157214 0.0027661 0.0064498
polar 835.84772 Da 33.22 s NA C25H14O21P6 2.707956 0.0029057 0.0067014
polar 774.52839 Da 1183.19 s NA C32H75N10O7PS 2.806289 0.0033151 0.0075219
polar 641.08784 Da 50.18 s NA C14H32N11O6PS5 2.840442 0.0034151 0.0077244
polar 312.26765 Da 1181.50 s NA NA 2.985652 0.0042354 0.0092726
polar 611.17805 Da 1184.30 s NA C28H38NO8PS2 3.145050 0.0042868 0.0093637

Apolar Phase

Metabolites with adj P-value < 0.01 and absolute log ratio > 2
phase bucket Name Formula logFC p.value padj
apolar 467.00597 Da 39.82 s NA C20H9N3O9S 4.000910 0.0000000 0.0000000
apolar 1052.52432 Da 807.07 s NA C65H84N2S5 -2.084333 0.0000000 0.0000000
apolar 792.56643 Da 505.34 s NA NA -2.839855 0.0000000 0.0000000
apolar 870.56984 Da 803.03 s NA C51H84O7P2 -6.365147 0.0000000 0.0000001
apolar 191.04414 Da 41.28 s NA NA 2.258838 0.0000000 0.0000002
apolar 892.54105 Da 803.03 s NA C52H81N2O4PS2 -4.162065 0.0000001 0.0000004
apolar 244.90652 Da 77.09 s NA NA -2.165518 0.0000001 0.0000009
apolar 195.05665 Da 40.87 s 2-_N-morpholino_ethanesulfonic acid C6H13NO4S 8.600611 0.0000003 0.0000020
apolar 412.09570 Da 40.31 s NA C11H24N8OS4 5.166741 0.0000017 0.0000080
apolar 349.96167 Da 41.67 s NA NA -2.944103 0.0000252 0.0000832
apolar 886.56398 Da 850.98 s NA C34H66N26OS -2.829708 0.0000922 0.0002589
apolar 396.00441 Da 41.68 s NA NA -2.234544 0.0001141 0.0003107
apolar 381.98812 Da 41.95 s NA C18H7O8P -2.782766 0.0001375 0.0003630
apolar 551.42610 Da 842.78 s NA NA -3.244671 0.0003743 0.0008795
apolar 335.94565 Da 41.39 s NA NA -2.142793 0.0007243 0.0015716

Export

## `mutate_if()` ignored the following grouping variables:
## Columns `phase`, `bucket`

T-test results were exported to ./data/processed/metabolomics/20200309_metabolomics_ttest.csv